Automatically Finding Good Clusters with Seed K-Means

نویسندگان

  • Miyoung Shin
  • Eun Mi Kang
  • Seon Hee Park
چکیده

In finding biologically relevant groups of genes with gene expression data obtained by microarray technologies, the k-means clustering method is one of the most popular approaches due to its easiness to use and simplicity to implement. However, the randomness of k-means clustering method in choosing initial points to start with makes it impossible to obtain reliable results without much iteration of the entire clustering process [2]. Our goal here is to introduce a novel clustering method, which we call it seed k-means clustering, where a novel algorithm is employed to automatically find good initial seeds for k-means clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Validation of k-means and Threshold based Clustering Method

Data mining isa process of extracting interested hidden information from large databases. It can be applied on many databases but kind of patterns to be found is specified by various data mining techniques.Clustering is one of the data mining techniques that partitions database into clusters such that data objects in same clusters are similar and data objects belonging to different cluster are ...

متن کامل

Robust partitional clustering by outlier and density insensitive seeding

The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed se...

متن کامل

A hybrid clustering technique combining a novel genetic algorithm with K-Means

Many existing clustering techniques including K-Means require a user input on the number of clusters. It is often extremely difficult for a user to accurately estimate the number of clusters in a data set. The genetic algorithms (GAs) generally determine the number of clusters automatically. However, they typically choose the genes and the number of genes randomly. If we can identify the right ...

متن کامل

Validity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance

The k-means method has been shown to be effective in producing good clustering results for many practical applications. However, a direct algorithm of k-means method requires time proportional to the product of number of patterns and number of clusters per iteration. This is computationally very expensive especially for large datasets. The main disadvantage of the k-means algorithm is that the ...

متن کامل

ModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means

Clustering; Classification; K-Means; Cluster evaluation; Data mining Abstract In this paper we present two clustering techniques called ModEx and Seed-Detective. ModEx is a modified version of an existing clustering technique called Ex-Detective. It addresses some limitations of Ex-Detective. Seed-Detective is a combination of ModEx and Simple KMeans. Seed-Detective uses ModEx to produce a set ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003